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PERFORMANCE EVALUATION . <^ 

,THE USE OF SCORING SYSTEMS IN- ADAPTIVE TRAINING 
- ' By 

' Squadi«)n Leader C. J. Hyatt, RAF and Captain 6. .H. DeBerg, USAF 



INTRODUCTION 



This paper is intended to describe work which we have done at the 
Crew Station Design Facility, in the field of scoring complex tracking 
tasks, 'it is»4ntended to provide a down-to-earth approach to the j^evelop- 
ment of a pMlosophy ^f or scoring systems without getting deeply involved 
in mathematical approaches or analysis or results. The typical sort of 
tracking task we might look at in the facility is landing an aircraft in 
Instrument Flying Conditions, using various 'forms of approach aid, ajid 
with various task loadings. 

The approach we use to this problem will obviously read acrrfss fairly 
readily to many otW areas of study ijivolving the use of motor skills. 
Our work" to date has been entirely in the field of developmental studies 
rather *than training, but since both these processes are amenable to 
treatment by an adaptl^)e loop approach, they have many features in common, 
and in part iculair ■ they both need a meaningful scoring system, ^"^ure 
projects which will require us to develop scoring systems specifdcally 
for adaptive trainitfg are the Simulator for ; Air to Air Combat, and the 
Advanced Simulator for Undergraduate Pilot-Training. 

Figure! shows how tjie Crew Station Design Facility 'fits into 
ASD organization. It is part o^ the Directorate of Crew and A(?E Engi- 
neering, 'which is responsible. for providing an advisory service to the 
individual systems program offices. of ASD. Ydu will see that we are in 
the' same division as t^e' simulator branch, ^nd we a\so have simulators 
of our ox»n, used almost exclusively for experimental wotk. 

> As the emphateis on si^lators *in trainifig increases, and these 
simulators become more sophisticated, the need for sophistication in our 
scori-hg systems increases, and over the last year, we< have found it 
necessary to educate our 'customers'. This- paper i^ 4 distillation of 
our s tud ies • * \ 

GENERAL DISCUSSION OF PHILOSOPHY - , 

. I want to start by .looking at some definitionsj working definitions 
'rather than academic ones. The first of the^ is what w§ mean by an 
adaptive process. Figure 2 shows how we haVe defined it for t^ur purposes 
in very general terms. ♦ # ^ 
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Figure 3 shows hdW this process can represented by a classic three 
block loop, ahd I think^inost workers in this field would accept this 
general concept. More specifically, in the* context of training, the 
three blocks can he more precisely defined as shown in the. lower half 
ofc the figure. There already appears to be a fair consensus of opdnioil 
that the quality of the performance measure is "the m^e^r break"^fea- 
ture oT the syst^^ Most people would agree that it is unfortunately 
also he'avily subjective. To make the performance measure as good as 
possible let us fixst look at what we want from a scoring system. We 
have made •'a list (Figure 4 ) of qualities which appear to be important: 

The first and most Important is that the scoring system should be 
directly related to the objectives of the adaptive process'. To ensure 
this, wfe have to pfersuade the"" training or experiniental .director to 
define his objectives clearZy. 

Jhe remaining qualities are not in any particular order: "^If the 
objectives a;re multiple, each aspect must' have 0ome element of the 
scoring system directed toward it. " 

Many parameters may be collected dutipg a 'study, and the mass of 
data can be confusing. It is necessary to reduce thie to manageable 
* and comprehensible proportions. 
* * • - 

Although some subjective decisions must be made in the formulation 
of the scoring system, the user should not have to make any when 
applying it-ideally a computer should 'be able to handle it. 

Having achieved a numerical^alue for the score. It should be 
possible to relate the various ranges with an acceptability index, 
for example: Excellent, Good, Average, Poor, Unacceptable. 

For the greatest benefit to be obt^Tined from the adaptive system, 
knowledge of resqlts is a valuable aid, and hence it is desirable i 
if' possible for the scores to be generated in real time. 

To fully apply these principles we feel that it is necessary to 
develop the performance measure block of .the classic adaptive loop ^ 
(Figure 5 ). The classic adaptive training loop calls for the application 
of the philosophy of training in the formulation of the adaptive logi^. 
This means simply how the next fi^tep of training is governed by present 
performance.- Because this is"^ the only block labelled 'Logic* there is 
a tendency to try and use this. step to insert logic related to the 
training. objectives. We believe that the performance measure block 
should be developed into three separate stages as ehbwn in the lower 
part of the figure, and that the value logic and adaptive logic should 
be kept entir^y separate from, each other. It is these three steps 
that we want ti) concentrate your attention on. In simple and direct 
terms these threes steps can be relabelled as shown in Figure 6 . Our 
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aim is to Hev^op^ the^hilbsophy which should be applied to the generation 
of the value o^/scoring logic step in this**process* 

r< . - ' ' ^ * • ^ ■ 

DEVELOPMEl^/bF SCGfelUG LOqjlC ' ^ . . 

' t. ' 

Having defined the concept of 'scoring logic', it is easy to see that 
this logic is the primary linl* between the raw data which can be obtained 
by mbnitoring performance, and the score which is used to determine 
subsequent progress. 

It is the sole point in the loop at which the recorded ^performance 
is ^weighed against thd fundamental ohjeatives of the process. As a re- 
suit, the quality of the scoring logit is the key factor in determining 
whether or not the adaptive cycle is efficiently directed toward the 
alms of the process, be it an experimental study or a training progjam.* 

It is qnly too easy to skip over the question pf objectives when^ 
defining the'logic, and there is, perhaps, q^n even more insidious 'risk 
of using.existlAg scoring logi^ 'because it worked well last time'. 
Scoring system^ tend to look similar to each other, especially those 
used in any one particular field' of endeavor, and subtle but vital differ- 
ences can go un-noticed. * ' ^ ^ . 

One way of minimizing this risk' - which we believe is ^ i^orthwhile 
investment of time - is to carry out an objective evaluation of the ^ 
true aims of every new scoring system We devise, and to develop a soiShd 
rationale for the scoring logic. Better still, this rationale shbuld 
be formally written up ^nd includecjl as an integral part of our descrip- 
tion of the scoring system. 

Let us then look at the various ways in \^hich the aims of the 
adaptive process we are considering can affect ^the way we go about scoring 
it. There are a number of questions we have to ask ourselves, and some 
of the majpr ones are outlined in FigCtre 7 . 

The first - and in our developmenta;! studies the most fundamental - 
question, and yet oddly enough the one most frequently overlooked, is 
which part of the man-machine system ai^e we lobking at: the man, or the 
machine, *or the interface between them. 

' In the training context the answer is simple - the man, that is to 
to say the trainee, is paramount. I would like to. digress for a moment, 
though, and speculate on the wealth of data which has, at one time and 
another, been collected within adaptive training systen\s and which if 
it is still stored, could have potential value for the study of the 
training machines used, or of the way they display information. It 
seems probable that much of this data might ^be tapped slinply by running 
it through new scoring systems with appropriate changes in their alms - 
assuming that the original scoring system was correctly aimed at training. 
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. Much valuable information on the merits <>r shortcomings of various 
monitoring or operating consoles currently in use jnight be acquired ±n 
this way, ^ . . k 

However, as 1 s^id, this^ is a digression^ and in the training 
context it- is the trainee we are tryijig to assess. 

The next question we have to ask is wlmt are we trying to find put 
about the trainee. The basic answer here is obvious": we want to know 
how well he performs. But to decide how to measure this performance, 
need to ask a number of subsidiary questions • ' 

For example, we have to decide wHith of the parameters available 
to us - which meana those parameters 'we can measure without undue 
expense -.ate relevant to performahce. And the performance we use as 
a yardstipk here must itself be performance which is directly relevant 
to the training objectives. • * 

r ' 

A good example of the sort of decision we haye to make' in this area 
arlsea when scoring a. tracking task such as an Wstrument Landing 
Approach. Should we consider distance along. track as a measure of 
performance? This distance is one Wy'of looking at the trainee's con- 
trol of his speed; not,iJust his instantaneous speed, but the integral 
of his spe^d with' respect to tlmia from the start to the present. Thus 
if he makes an error in speed, ^o minimize^his resulting penalty score 
he must not only correct the speed, but make a suitable* compens^rtory 
adjustment tp bring him back to his correct ppfiition along track. The ' 
argument' againdt using this paratnetel^ for scoring is that as l,ong as 
the trainee stays on the correct line through space as defined by the 
landing system, it does not matter wl>en he gets to various points along 
it. However ,^t«his obvipusly depends on the scenario. If the object 
of the exercise is aot only to fly along the correct line and land at 
the correct point, butt also to land at a specific tiriie to fit in with an 
existing traffic pattern then the accuracy of his position along track 
must be considered of some importance. 

Another decisiowwe Jikve to make, which is. also related to what we 
are trying to find out about the trainee's performance, is whether we** 
should be concerned with his continuous operating or tracking* ability, 
or only with Jti^s ability to reach a certain point by any means at his 
disposal. Thi$ quitff clearly will detetmine whether we want a continuous 
scoring system or what we tenp 'gate' scoring - that ifl to say a measure 
of his ability to pass through a gate in space, or perhaps a aeries of . 
gates* • '5 . , ^ . 

I realize that you may be thinking sthat the points we are making 
here are overly' simple and obvioufi 7 they are simple, and they should 
hm obvious, but ue think It la vitally Important to emphasize tha^t a 
cold-blooded anaJLysxs of. this sort should 1>e made, rather than the sort 
of approach whidh we kndw from experience ofti^ does occur, based on the 



^principle , of "Well, we usually score parameters A, B, and C - it looks 
as if th^^ should be OK again this time." 

Having, we hope, established whlcK parameters we want to select from 
the raw data, we i^ext have to think how the values of these parameters 
can best be used to give us a measure of performance. The logioal 
approach is to cpntpare these achieved vaJraed with some , ideal • Most 
scoring systems adopt this approach, but once again a vital step which 
may be missed is to ensure that the ideal we set is truly relevant to 
our objectives. It is useless to set an ideal value of some parameter 
at 100\feet plus or minus^ zero. When we know that the 'Ace of the base' 
can only achieve 100 feet plus or minus ten feet, and plus or minus 
twenty feet is tjuite adequate for routine performance of the task. This 
is also m useful point at which tp consider the limitations of our 
measured data; we can run into serious problems if we try to score to 
the nearest foot, when the equipment - and I include here both the 
instructoij^a monitoring equipment and tl^e trainee's operating equipment 

can only ffl|Basure to the pear est five feet. 

» ■ 

Even having established an appropriate ideal" and a valid measure 
of divergence^ f rom it, we still need to consider what the implications 
of this divergence are. ^ 

. First of )Jtll we must look at the relationship between size of 
divergence arid importance - for example in a certain situation a five 
fpot error could be acceptable, a ten faot error merely embarrassing, 
but a fifteen foot error fatal. Clearly this is not a linear relation- 
ship - at lea$t> not in terms ot human values, and *our score should re- 
flect this. la an extreme case of this sort of situation^ any error 
iess than ten feet jnight be totally acceptable, while anything in excess 
of ten feet .would be- totally unacceptable. 

This is an example of what is generally known as 'time on target' 
scoring. In some contexts such as air to air combat with guided 
^missiles it may be a perfectly adequate measure of performance, although 
even here it is more suited to competitive scoring than to training. 
But for a complex tracking task it contains too little data to help the^ 
instructor or trainee to identify areas- of weakness and plan remedial 
training accordingly. 

Another thing we baye to take' into account when assessing the 
divergence fef - the achieved performance from the ideal is the fact that, 
some parameters may be much more critical than others,, and if we want 
a scoring system which assigns equal penalties to equally unacceptable 
errors, we must weight the measured errors ^accordingly^. 

A FORMAL PROCESS FOR CREATING A SCORING SYSTEM 

I now want to recapitulate on the varlotta sorts of decision we have 
d,lscus8ed, and In so doing I want to formalize and develop a process 



159 • \ 

G 



for creating an effective scoring system. 

I want to show l^ow, starting from the raw data available to us, 
we cart adopt a systematic approach to mould it Jto our purposes, and 
ensure that our training objectives are met. 

p# • ♦ - 

There are various processes we can apply to a mass o^ raw data, but ^ 
we believe that, six of these processes, shown In Figure 8 , if applied 
in sequence will go far toward producing a sound system. 

The first step is to select the parameters wfiJ'ch are relevant to 
our training objectives. / 

Next we must look, at what data is available to up on these para- 
.meters and edit it. By editing it, I mean deciding wliich values of it 
to^^us^, and the choice here ranges from using it all, through using it 
at regular^ interivals, to using it qnly at specific points which we 
consider relevant. , . ^ 

We must then Qompare this edited^ data with^what we believe to be 
the ideal values we ^re seeking to achieve by out trait^ing program. This 
comparison Will give us wh^t we have chosen to terra * error, values'. 

/' 

These error values must now have two processes applied to them. 
These are Modification and Weighting. The dividing line between them 
is not clear cut, and for this reason - to avoid lengthy discussion 
of what each comprises, I will not separate them. The operations 
which I include under these headings are: 

Ensuring that the error values' reflect the trainee's performance 
rather than any shortcomings in the training equipment. f 

Ensuring that the size of the error and the Importance of this, 
size is suitably reflected in the score. $ 

Ensuring that the more critical parameters carry appropriately 
heavier scaring penalties. 

Hi 

J Ensuring that any inter-relation between parameters is accounted 
for - this for example Vould includq any weighting in respect of 
range if this w^re considered relevant. 

The result of modifying and weighting the error values is to 
produce what we call 'scoring elements*. 

Finally the scoring elements we have arrived at can be combined 
to give a single comprehensive scord, or they may be combined in groups 
to give sub scores related to particular parts of t^e training objective, 
or particular- capabilities of the trainee. Part of this combination 
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process will be normalization of the scor-e with respect to time or 
distance if this appears to be appropriate. 

We believe that this systematic approach gives the best 4^hance of 
achlevjj^ng a useful and meaningful scoring system, 

PRAgTICAL EXAMPLES ^ . 

If our systematic approach is applied to a variety of 'training pro- 
grams, a variety of scoring systems will naturally result, but in all 
probability they will fall into four or five broadly 'defined groups; 
for example time on target .systems, cumulative error systems, gate score 
systems, or combinations of these. Within any of these groups, one 
can postulate a generalized scoring system which by manipulation of a 
series of constants can be used for various slightly different training 
tasks. This concept lends itself very well to a coTQputer based scoring 
system within an evolving organization - for example a pilot training 
school - where the training objectives remain fairly constant but the 
equipmen-t used, and the associated" operating procedures, will probably, 
evolve steadily over the years, 4 

To' give you a good example of this, I will offer a brief outline 
of the' type of scoring system we are currently using in our deyelop- 
mental studies to assess a pilot' 6 ability to carry out landing 
approaches ''under instrument flying conditions. Remember of course that 
in th^ae studies our' aim is to assess the man-machine interfac^: This 
does not affect the process of scoring, but only the detailed application 
of the scoring logic. 

We have concluded that the appropriate method to adopt in this 
instance is continuous scoring, with the score a function of size of 
error. Time on target scoring simply does not tell us enough about 
" the things we need to know. However wp do also take several specific 
gate scores at appropriate points en route, depending on the nature of 
tike particular stud)t^. Our terms are defined in Figure 9 , and a 
generalized formula for this continuous scoring is shown at Figure 10 , 

You will see that this formula allows for easy adaptation to suit 
changes in the relative importance of the different errors by adjusting 
the constants K, changes in the relationship between magnitude and 
importance of errors Sy adjusting the functions' of the error values, 
changes in normalization philosophy by adjusting the function of time, 
and changes in weighting for range by adjusting the range function. 

To illustrate how this type of /formula has been applied, I will use 
the Microwave Landing System as an example. The object of this exercise 
is to ascertain whether or not certain tracks in airspace can be flown, 
on a tiftie schedule, in a safe and efficient manner. Different routes 
are to be flown, and the object of l>|tie scoring system is to indicate 
the relative desirability of each route. Some typical routes are shown 
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in Figure ll , 

* ' \ 

An MLS route differs fijom the standalrd ILS approach in several ways. 
iFirst it is time dependent • The pilot must be in the right place at the 
right time* Also he must fly several different heading-e, some on a 
command basics, some by dead reckoning* Finally thd descent rate is' not 
constant, but depends upon which portion Of the approacii ts being flown. 

Scoring these approaches depends upon our value logic (as is true 
in any scoring system). This logic is based upon t;he objectives defined 
for the study. As initially set forth' the purpose of MLS approaches is 
to control aircraft throughojit a given airspace both with respect to 
time and spatial orientation relative to the prescribed path. This 
leads to the following logic: , * . • 

(1) The aircraft must be equally controlled throughout the 
airspace. ^ ' 

(2) The aircraft must be in the "right place at;^ the right 
time.** . . 

\ 

(3) For safety cbnsiderations the further the aircraft is 
from track the closer it approaches'a critical sltuatlpn. 

(4) Some types of- error are more ciritical than others (i.e. 
low on altitude is worse than highi^on altitude). 

(5) Different tracks must be compared to one another on the 
same basis. 

Apply these criteria, all possible parameters , are analyzed and 
either discarded or modified and included in the score as considered 
appropriate. The formula shown in Figure 12 is th^ result. No range 
weighting is included because he must fly Just as accurately at great 
distances as he does in close. Time of arrival considerations are met . 
by using along track error which is time dependent. Hence the error ^ 
in this direction is taken to be the distance of his actual position 

rom where he should be. The safety consideration states that big 
errors can be critical, hence a square law is used. This penalizes 
large errors very heavily. The weighting constants were arrived at 
subjectively by discussion w^th qualified personnel as to the relative 
importance of different types of errors to be considered in a safe 
approach. Fihally, in order 'to compare different trapks the entire 
score is normalized with respect to time. The resulting equation can 
be taken as a whole or^y parts to examine the quality of the approach. 

CONCLUSION ^ 

♦ To conclude, we do not claim to have found all the answers, but 
we do feel we have gone a long way toward asking the right questions. 
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; ^ I , 

We have described 'a learning process that we h^ye gone through, and 
which we suspect many people go through in the'process of producing ^ 
sct)ring' systems. It has been a salutory experience for us to formulate 
our id*eas into a systematic process. We hope thaf our presen^tation 
today will stimulate interest in the- process and perhaps save others 
some of the time we have spent. 

Remembert To produce an' effective scoring system three thing's are 
.essential. Good r^w data must be collected, good value logic, on^ 
scoring logic, must be applied to it. The resulting components must 
be assembled in a practical manner. Figure 13 shows how our proposal 
sequence of operations generates the first three steps of the five 
step adap^i^e loop. The common thread running througl) the whole . 
process is relevancy - we must constantly ask ^rselves if our scores 
relate to our aims. . - 

We also hope that we have given you some food for thought, and that 
some of you will feel like contributing your own ideas in discussion 
now. Perhaps we may discover a few more of the answers to the questions 
we have posed. Thank you. - ' r , 
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